Psychophysical evaluation of PSOLA: natural versus synthetic speech
نویسندگان
چکیده
This paper presents the results of psychophysical experiments dealing with pitch-marker positioning within the Pitch Synchronous OverLap and Add (PSOLA) framework. Sustained natural vowels were PSOLAmodified in fundamental frequency. The experiments were aimed at determining the auditory sensitivity to (1) deterministic shifts of either all or single pitch markers within a sequence, and (2) random shifts of all pitch markers (“jitter”). As for deterministic shifts of all pitch markers, the results were in reasonable agreement with results obtained previously for synthetic formant signals. For deterministic shifts of single pitch markers, thresholds depended on position in the sequence. Detection thresholds for jittered shifts were comparable to thresholds for detecting jitter in pulse trains. The ranking of the thresholds for these three conditions indicated that the auditory system is more sensitive to dynamic (modulation) cues rather than to static (timbral) cues arising from shifts in pitch-marker positioning.
منابع مشابه
Analysis of the degradation of French vowels induced by the TD-PSOLA algorithm, in text-to-speech context
In concatenative speech synthesis systems, synthetic speech is obtained by concatenating acoustic units selected from a database of natural speech. The duration and fundamental frequency (F0) of the selected units are usually different from those requested by a prosodic model, and so some prosodic modification must be applied to the units in order to obtain the desired target. TD-PSOLA is an ef...
متن کاملConcatenative Speech Synthesis: A Review
The primary objective of this paper is to provide an overview of existing Concatenative Text-To-Speech synthesis techniques. Concatenative speech synthesis can be broadly categorized into three categories, Diphone Based, Corpus based and Hybrid. Diphone based speech synthesis relies on different signal processing techniques such as PSOLA, FD-PSOLA etc. These signal processing techniques introdu...
متن کاملEvaluation of a Multilingual Tts System with Respect to the Prosodic Quality
Improving the naturalness of synthetic speech is an essential task in developing a text-to-speech (TTS) system. Mainly, it depends on the quality of the prosody model which is utilized in the TTS system. For our TTS system called DreSS (Dresden Speech Synthesizer), we compared three different methods for generating the F0 contour to each other as well as to other synthesizers. Natural speech sa...
متن کاملA hybrid method oriented to concatenative text-to-speech synthesis
In this paper we present a speech synthesis method for diphonebased text-to-speech systems. Its main goal is to achieve prosodic modifications that result in more natural-sounding synthetic speech. This improvement is especially useful for emotional speech synthesis, which requires high-quality prosodic modification. We present a hybrid method based on TD-PSOLA and the harmonic plus noise model...
متن کاملA new synthesis algorithm using phase information for TTS systems
New speech synthesis algorithms capable of flexible prosody (es pecially F0) modification are desired for a high quality TTS syst em. TD-PSOLA is the most popular synthesis algorithm. The al gorithm shows very high quality when F0 modification is limite d. However, the quality degradation due to pitch epoch detection error becomes severe as the F0 modification factor becomes lar ge. On the othe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997